Young, Nick

Nicholas Young is a PhD candidate at Michigan State University studying physics education research and computational mathematics, science, and engineering. After graduation, he hopes to become a physics professor. He plans to use what he learned from PDP to create equitable and inclusive activities for his classes.

 

youngn18@msu.edu

 

dsc04498-copy.jpg

 

Teaching Activity Summary

Name of Teaching Activity: Data-driven models: Decision Trees

Teaching Venue & Date: Columbia Summer Research PREP. May 28, 2019

Learners: 14 undergraduate students.

Reflection on teaching and assessing the core science or engineering concept:

Our main content goal was for students to build decision trees to predict some feature of a dataset and then to justify how they know their decision tree has the maximum accuracy. Machine learning methods are becoming popular in many fields of science, including astronomy. As many machine learning methods are “black boxes,” it may be difficult for students to develop conceptual understanding of how these methods work. However, decision trees are a “glass box” method, where the user can see each step in the process, which makes decision trees an ideal method to use to introduce students to machine learning. Further, despite being one of the simplest machine learning methods, they are actually used in astronomical research (see [1] and [2]).

To help students understand machine learning methods, we developed an activity where students would first gain experience with creating a simple machine learning model using the Python language with the Iris data set, a simple dataset predicting the type of flower from its petal length/width and its sepal length/width, before applying those skills to an actual physics or astronomy dataset. Throughout the activity, the learners worked in pairs, as pair-programming has been shown to lead to greater learning and reduced frustration [3]. To assess our learners, we created a rubric with three dimensions: 1) building a decision tree using multiple variables, 2) computing the accuracy of their model and using that to assess how good their model is, and 3) changing the values of the parameters used to build the model to achieve the maximum accuracy. Each of these was assessed on a binary scale, that is, whether the student showed sufficient evidence of proficiency or not.

When assessing our students, we found that all of our students were able to complete the intended tasks with the exception of one. All of our students were able to build a decision tree and could manually navigate the tree to make a prediction about new data we provided. All of the students were also able to compute the accuracy of their models and knew that a higher accuracy was better for the datasets we provided. However, when it came to making the best model, the students encountered some difficulties. While the students were able to vary the parameters and see how the accuracy changed, they were not able to explain why the accuracy changed based on the parameters they chose. In future iterations of this activity, we would want to spend more time more describing how the decision tree is made and what factors affect the accuracy. As many students did not make it through the entire activity, we did not have time to go into as much detail as we would have liked.

 

[1] E. C. Vasconncellos et al. 2011 https://doi.org/10.1088/0004-6256/141/6/189

[2] A. A. Suchkov, R. J. Hanisch, and Bruce Margon 2005 https://doi.org/10.1086/497363

[3] Laurie Williams and Richard L. Upchurch (2001) https://doi.org/10.1145/366413.364614